-
Notifications
You must be signed in to change notification settings - Fork 357
[parser.c] Various string unescape optimizations #920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…tra memchr searching the remaining characters when no more backslashes exist.
…AArch64 architecture.
…arch64 platforms.
| signed char b0 = digit_values[p[0]]; | ||
| signed char b1 = digit_values[p[1]]; | ||
| signed char b2 = digit_values[p[2]]; | ||
| signed char b3 = digit_values[p[3]]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is a risk of reading past the end of the buffer here.
Prior digit_values[p[0]] < 0 would return if p[0] is a NULL byte.
I think this optimization does make sense, but we need to ensure we can actually read 4 bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I checked all the callsites, they all already ensure that.
So I think I'm good with that optimization, except I think I'd like to refactor the bound check inside the function.
To be frank with you, my opinion on the matter is that Apple hardware is overwhelmingly used in development, almost never in production, so it's not really worth taking on extra complexity just for that environment, unless of course the gain is massive. I do however like your two optimizations that aren't specific to macOS+aarch64. |
+1. I support this decision. Thanks again for your stewardship of this gem! |
This PR contains 4 commits which optimize
json_string_unescape. Admittedly, I almost didn't create this PR as I don't love the Apple Aarch64-specific code... however I decided to leave it up to the maintainers of this library to decide if this code is worth accepting.I omitted the SWAR / ARM Neon unless the code is being compiled on Apple platforms as a quick benchmark on my x86 laptop with the SWAR
find_backslashand customjson_memcpycode showed it was no fastermaster. Looking at thememcpyandmemchron Ubuntu showed a AVX optimized libc. At the time the code was structured a bit differently and it was only copyingsizeof(uint64_t)bytes at a time on x86. I could try implementingfind_backslashandjson_memcpyusing SSE2 if desired. I should note that onmemchron macOS does seem to be optimized using ARM Neon instructions. Looking at the assembly, the implementation is similar though not quite the same. I think the performance improvement comes from reduced function call overhead / branching asfind_backslashesends up inlined.With respect to keeping track of the
additional_backslashesinstead ofhas_more, if the string to unescape ends with a long run of characters that do not have any more backslashes, we can save amemchrand directly copy to the end of the string.I'm cool if you decide not to accept any of these and simply close this PR. These are just some random things I've been testing out.
Benchmarks
These were run on an Macbook Air M1.
With only the
addtional_backslashescommit:The benchmark:
additional_backslashesandfind_backslashes:additional_backslashes,find_backslashes,json_memcpy:additional_backslashes,find_backslashes,json_memcpy,unescape_unicode: